AITopics | bellman error

Collaborating Authors

bellman error

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Improve Agents without Retraining: Parallel Tree Search with Off-Policy Correction

Neural Information Processing SystemsApr-25-2026, 06:35:46 GMT

Tree Search (TS) is crucial to some of the most influential successes in reinforcement learning. Here, we tackle two major challenges with TS that limit its usability: distribution shift and scalability. We first discover and analyze a counter-intuitive phenomenon: action selection through TS and a pre-trained value function often leads to lower performance compared to the original pre-trained agent, even when having access to the exact state and reward in future steps. We show this is due to a distribution shift to areas where value estimates are highly inaccurate and analyze this effect using Extreme Value theory. To overcome this problem, we introduce a novel off-policy correction term that accounts for the mismatch between the pre-trained value and its corresponding TS policy by penalizing under-sampled trajectories.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Bellman Residual Orthogonalization for Offline Reinforcement Learning Anonymous Author(s) Affiliation Address email

Neural Information Processing SystemsApr-24-2026, 18:11:15 GMT

We propose and analyze a reinforcement learning principle that approximates the1 Bellman equations by enforcing their validity only along an user-defined space of2 test functions. Focusing on applications to model-free offline RL with function3 approximation, we exploit this principle to derive confidence intervals for off-policy4 evaluation, as well as to optimize over policies within a prescribed policy class.5 We prove an oracle inequality on our policy optimization procedure in terms of6 a trade-off between the value and uncertainty of an arbitrary comparator policy.7 Different choices of test function spaces allow us to tackle different problems8 within a common framework. We characterize the loss of efficiency in moving9 from on-policy to off-policy data using our procedures, and establish connections10 to concentrability coefficients studied in past work. We examine in depth the11 implementation of our methods with linear function approximation, and provide12 theoretical guarantees with polynomial-time implementations even when Bellman13 closure does not hold.14

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
Europe > United Kingdom > England (0.27)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Bellman Residual Orthogonalization for Offline Reinforcement Learning

Andrea Zanette

Neural Information Processing SystemsApr-24-2026, 18:11:12 GMT

We propose and analyze a reinforcement learning principle that approximates the Bellman equations by enforcing their validity only along an user-defined space of test functions. Focusing on applications to model-free offline RL with function approximation, we exploit this principle to derive confidence intervals for off-policy evaluation, as well as to optimize over policies within a prescribed policy class. We prove an oracle inequality on our policy optimization procedure in terms of a trade-off between the value and uncertainty of an arbitrary comparator policy. Different choices of test function spaces allow us to tackle different problems within a common framework. We characterize the loss of efficiency in moving from on-policy to off-policy data using our procedures, and establish connections to concentrability coefficients studied in past work. We examine in depth the implementation of our methods with linear function approximation, and provide theoretical guarantees with polynomial-time implementations even when Bellman closure does not hold.

arxiv preprint arxiv, machine learning, reinforcement learning, (12 more...)

Neural Information Processing Systems

Country: